1
Statistics as Random Variables: The Sampling Distribution
MATH003 Lesson 4
00:00
In statistical inference, we move from observing individual data points to analyzing a **statistic**—a functional mapping $Y = h(X_1, X_2, \dots, X_n)$ of a sample sequence. Because the underlying sample consists of random variables, the statistic itself is a random variable, and its probability law is known as the **sampling distribution**.

The Statistic as a Mapping

A statistic is formally defined as a function $h: \mathbb{R}^n \to \mathbb{R}$. We define the probability of the statistic falling into a set $B$ using the pre-image:

$$h^{-1} B = \{(x_1, x_2, \dots, x_n) : h(x_1, x_2, \dots, x_n) \in B\}$$

The I.I.D. Foundation

For a sample of i.i.d. (independent and identically distributed) random variables, the joint probability of a specific sample point $(x_1, \dots, x_n)$ is the product of their marginal probabilities: $p(x_1)p(x_2)\dots p(x_n)$. This product serves as the weight for each point when calculating the total probability of the statistic taking a specific value.

Example 4.1.1: The Geometric Mean

Consider a discrete population where $p_X(1) = 1/2$, $p_X(2) = 1/4$, and $p_X(3) = 1/4$. We draw a sample of size $n=2$ ($X_1, X_2$) and define our statistic as the geometric mean: $Y_2 = (X_1 X_2)^{1/2}$.

To find the distribution of $Y_2$, we list all 9 possible pairs $(X_1, X_2)$, calculate their joint probability, and the resulting $Y_2$:

Pair $(x_1, x_2)$Prob $P(x_1)P(x_2)$$Y = \sqrt{x_1 x_2}$
(1, 1)1/41.000
(1, 2), (2, 1)1/8 + 1/8 = 1/41.414
(1, 3), (3, 1)1/8 + 1/8 = 1/41.732
(2, 2)1/162.000
(2, 3), (3, 2)1/16 + 1/16 = 1/82.449
(3, 3)1/163.000

Exact vs. Asymptotic Distributions

Before moving to limit theorems like the Central Limit Theorem (CLT), we must master the "Exact Distribution." This involves calculating the specific probability mass or density function for a statistic given a small, finite $n$. When the analytic form becomes intractable, we resort to numerical simulations like **Monte Carlo approximations**.

🎯 Core Principle
A sampling distribution is the distribution of a random variable corresponding to a function of some i.i.d. sequence. It is the bridge between raw data and scientific inference.